31 research outputs found

    Apprentissage de la structure de réseaux bayésiens : application aux données de génétique-génomique

    Get PDF
    Apprendre la structure d'un rĂ©seau de rĂ©gulation de gĂšnes est une tĂąche complexe due Ă  la fois au nombre Ă©levĂ© de variables le composant (plusieurs milliers) et Ă  la faible quantitĂ© d'Ă©chantillons disponibles (quelques centaines). Parmi les approches proposĂ©es, nous utilisons le formalisme des rĂ©seaux bayĂ©siens, ainsi apprendre la structure d'un rĂ©seau de rĂ©gulation consiste Ă  apprendre la structure d'un rĂ©seau bayĂ©sien oĂč chaque variable reprĂ©sente un gĂšne et chaque arc un phĂ©nomĂšne de rĂ©gulation. Dans la premiĂšre partie de ce manuscrit nous nous intĂ©ressons Ă  l'apprentissage de la structure de rĂ©seaux bayĂ©siens gĂ©nĂ©riques au travers de recherches locales. Nous explorons plus efficacement l'espace des rĂ©seaux possibles grĂące Ă  un nouvel algorithme de recherche stochastique (SGS), un nouvel opĂ©rateur local (SWAP), ainsi qu'une extension des opĂ©rateurs classiques qui permet d'assouplir temporairement la contrainte d'acyclicitĂ© des rĂ©seaux bayĂ©siens. La deuxiĂšme partie se focalise sur l'apprentissage de rĂ©seaux de rĂ©gulation de gĂšnes. Nous proposons une modĂ©lisation du problĂšme dans le cadre des rĂ©seaux bayĂ©siens prenant en compte deux types d'information. Le premier, classiquement utilisĂ©, est le niveau d'expression des gĂšnes. Le second, plus original, est la prĂ©sence de mutations sur la sĂ©quence d'ADN pouvant expliquer des variations d'expression. L'utilisation de ces donnĂ©es combinĂ©es dites de gĂ©nĂ©tique-gĂ©nomique, vise Ă  amĂ©liorer la reconstruction. Nos diffĂ©rentes propositions se sont montrĂ©es performantes sur des donnĂ©es de gĂ©nĂ©tique-gĂ©nomique simulĂ©es et ont permis de reconstruire un rĂ©seau de rĂ©gulation pour des donnĂ©es observĂ©es sur le plante Arabidopsis thaliana.Structure learning of gene regulatory networks is a complex process, due to the high number of variables (several thousands) and the small number of available samples (few hundred). Among the proposed approaches to learn these networks, we use the Bayesian network framework. In this way to learn a regulatory network corresponds to learn the structure of a Bayesian network where each variable is a gene and each edge represents a regulation between genes. In the first part of this thesis, we are interested in learning the structure of generic Bayesian networks using local search. We explore more efficiently the search space thanks to a new stochastic search algorithm (SGS), a new local operator (SWAP) and an extension for classical operators to briefly overcome the acyclic constraint imposed by Bayesian networks. The second part focuses on learning gene regulatory networks. We proposed a model in the Bayesian networks framework taking into account two kinds of information. The first one, commonly used, is gene expression levels. The second one, more original, is the mutations on the DNA sequence which can explain gene expression variations. The use of these combined data, called genetical genomics, aims to improve the structural learning quality. Our different proposals appeared to be efficient on simulated genetical genomics data and allowed to learn a regulatory network for observed data from Arabidopsis thaliana

    SAGA: Sparse And Geometry-Aware non-negative matrix factorization through non-linear local embedding

    Get PDF
    International audienceThis paper presents a new non-negative matrix factorization technique which (1) allows the decomposition of the original data on multiple latent factors accounting for the geometrical structure of the manifold embedding the data; (2) provides an optimal representation with a controllable level of sparsity; (3) has an overall linear complexity allowing handling in tractable time large and high dimensional datasets. It operates by coding the data with respect to local neighbors with non-linear weights. This locality is obtained as a consequence of the simultaneous sparsity and convexity constraints. Our method is demonstrated over several experiments, including a feature extraction and classification task, where it achieves better performances than the state-of-the-art factorization methods, with a shorter computational time

    Probing transcription factor combinatorics in different promoter classes and in enhancers

    Get PDF
    19 pagesInternational audienceBackgroundIn eukaryotic cells, transcription factors (TFs) are thought to act in a combinatorial way, by competing and collaborating to regulate common target genes. However, several questions remain regarding the conservation of these combinations among different gene classes, regulatory regions and cell types.ResultsWe propose a new approach named TFcoop to infer the TF combinations involved in the binding of a target TF in a particular cell type. TFcoop aims to predict the binding sites of the target TF upon the nucleotide content of the sequences and of the binding affinity of all identified cooperating TFs. The set of cooperating TFs and model parameters are learned from ChIP-seq data of the target TF. We used TFcoop to investigate the TF combinations involved in the binding of 106 TFs on 41 cell types and in four regulatory regions: promoters of mRNAs, lncRNAs and pri-miRNAs, and enhancers. We first assess that TFcoop is accurate and outperforms simple PWM methods for predicting TF binding sites. Next, analysis of the learned models sheds light on important properties of TF combinations in different promoter classes and in enhancers. First, we show that combinations governing TF binding on enhancers are more cell-type specific than that governing binding in promoters. Second, for a given TF and cell type, we observe that TF combinations are different between promoters and enhancers, but similar for promoters of mRNAs, lncRNAs and pri-miRNAs. Analysis of the TFs cooperating with the different targets show over-representation of pioneer TFs and a clear preference for TFs with binding motif composition similar to that of the target. Lastly, our models accurately distinguish promoters associated with specific biological processes.ConclusionsTFcoop appears as an accurate approach for studying TF combinations. Its use on ENCODE and FANTOM data allowed us to discover important properties of human TF combinations in different promoter classes and in enhancers. The R code for learning a TFcoop model and for reproducing the main experiments described in the paper is available in an R Markdown file at address https://gite.lirmm.fr/brehelin/TFcoop

    Gene Regulatory Network Reconstruction Using Bayesian Networks, the Dantzig Selector, the Lasso and Their Meta-Analysis

    Get PDF
    Modern technologies and especially next generation sequencing facilities are giving a cheaper access to genotype and genomic data measured on the same sample at once. This creates an ideal situation for multifactorial experiments designed to infer gene regulatory networks. The fifth “Dialogue for Reverse Engineering Assessments and Methods” (DREAM5) challenges are aimed at assessing methods and associated algorithms devoted to the inference of biological networks. Challenge 3 on “Systems Genetics” proposed to infer causal gene regulatory networks from different genetical genomics data sets. We investigated a wide panel of methods ranging from Bayesian networks to penalised linear regressions to analyse such data, and proposed a simple yet very powerful meta-analysis, which combines these inference methods. We present results of the Challenge as well as more in-depth analysis of predicted networks in terms of structure and reliability. The developed meta-analysis was ranked first among the teams participating in Challenge 3A. It paves the way for future extensions of our inference method and more accurate gene network estimates in the context of genetical genomics

    Reconstruction quality of a biological network when its constituting elements are partially observed

    No full text
    International audienceUnravelling regulatory regulations between biological entities is of utmost importance to understand the functioning of living organisms. As the number of available samples is often very low (often less than one hundred), inference methods are frequently performed on a subset of variables which make sense in the mechanisms under study. Classical remedies are either data driven (e.g., differentially expressed genes) or knowledge driven (e.g., using ontology information). However, whatever the chosen solution, important variables are very likely missed by the selection process, which is the issue at stake in the present paper

    Modeling transcription factor combinatorics in promoters and enhancers

    No full text
    We propose a new approach (TFcoop) that takes into account cooperation between transcription factors (TFs) for predicting TF binding sites. For a given a TF, TFcoop bases its prediction upon the binding affinity of the target TF as well as any other TF identified as cooperating with this TF. The set of cooperating TFs and the model parameters are learned from ChIP-seq data of the target TF. We used TFcoop to investigate the TF combinations involved in the binding of 106 different TFs on 41 different cell types and in four different regulatory regions: promoters of mRNAs, lncRNAs and pri-miRNAs, and enhancers. Our experiments show that the approach is accurate and outperforms simple PWM methods. Moreover, analysis of the learned models sheds light on important properties of TF combinations. First, for a given TF and region, we show that TF combinations governing the binding of the target TF are similar for the different cell-types. Second, for a given TF, we observe that TF combinations are different between promoters and enhancers, but similar for promoters of distinct gene classes (mRNAs, lncRNAs and miRNAs). Analysis of the TFs cooperating with the different targets show over-representation of pioneer TFs and a clear preference for TFs with binding motif composition similar to that of the target. Lastly, our models accurately distinguish promoters into classes associated with specific biological processes

    GIANT: Galaxy-based Interactive tools for ANalysis of Transcriptomic data

    No full text
    GIANT is a User-friendly tools suite for micro-arrays analyses and for exploring RNA-seq & Micro-Arrays differential result

    Inférence de réseaux de régulation de gÚnes au travers de scores étendus dans les réseaux bayésiens

    No full text
    Inferring gene regulatory networks tends to use several biological information. Here we use data from genetic markers and expression data in the framework of discrete static bayesian networks. We compare several scores and also the impact of a network connectivity a priori. We propose and compare two models with existing approaches of gene regulatory network inference. On simulated data one of our models reached better results in the case of small sample size. We use this model on real data in Arabidopsis thaliana.L’infĂ©rence de rĂ©seaux de rĂ©gulation de gĂšnes s’oriente actuellement vers l’utilisation conjointe d’informations biologiques complĂ©mentaires. Nous utilisons ici des donnĂ©es de marqueurs gĂ©nĂ©tiques en plus des classiques donnĂ©es d’expression dans le cadre des rĂ©seaux bayĂ©siens statiques discrets. Nous comparons les qualitĂ©s de diïŹ€Ă©rents scores ainsi que l’impact d’un a priori liĂ© Ă  la connectivitĂ© des rĂ©seaux. Nous proposons et comparons deux modĂ©lisations aux approches existantes pour l’infĂ©rence de rĂ©seaux de rĂ©gulation. Sur des donnĂ©es simulĂ©es, l’un de nos modĂšles obtient les meilleurs rĂ©sultats dans le cas d’échantillons de petites tailles. Nous utilisons ce mĂȘme modĂšle sur des donnĂ©es rĂ©elles d’Arabidopsis thaliana

    Modélisation de l'expression des gÚnes à partir de données de séquence ADN

    No full text
    International audienceGene expression is tightly controlled to ensure a wide variety of cell types and functions. The development of diseases, particularly cancers, is invariably related to deregulations of these controls. Our objective is to model the link between gene expression and nucleotide composition of different regulatory regions in the genome. We propose to address this problem in a regression framework using a Lasso approach coupled to a regression tree. We use exclusively sequence data and we fit a different model for each cell type. We show that (i) different regulatory regions provide particular and complementary information and that (ii) the only information contained in the nucleotide compositions allows predicting gene expression with an error comparable to that obtained using experimental data. Moreover, the fitted linear model is not as powerful for all genes, but better fit certain groups of genes with particular nucleotides compositions.L'expression des gÚnes est étroitement contrÎlée pour assurer une grande variété de fonctions et de types cellulaires. Le développement des maladies, en particulier les cancers, est invariablement lié à la dérégulation de ces contrÎles. Notre objectif est de modéliser le lien entre l'expression des gÚnes et la composition nucléotidique des différentes régions régulatrices du génome. Nous proposons d'aborder ce problÚme dans un cadre de régression avec une approche Lasso couplée à un arbre de régression. Nous utilisons exclusivement des données de séquences et nous apprenons un modÚle différent pour chaque type cellulaire. Nous montrons (i) que les différentes régions régulatrices apportent des informations diffé-rentes et complémentaires et (ii) que la seule information de leur composition nucléotidique permet de prédire l'expression des gÚnes avec une erreur comparable à celle obtenue en utilisant des données expérimentales. En outre, le modÚle linéaire appris n'est pas aussi performant pour tous les gÚnes, mais modélise mieux certaines classes de gÚnes avec des compositions nucléotidiques particuliÚres
    corecore